Exploiting the High Predictive Power of Multi-class Subgroups

نویسندگان

  • Tarek Abudawood
  • Peter A. Flach
چکیده

Subgroup discovery aims at finding subsets of a population whose class distribution is significantly different from the overall distribution. A number of multi-class subgroup discovery methods has been previously investigated, proposed and implemented in the CN2-MSD system. When a decision tree learner was applied using the induced subgroups as features, it led to the construction of accurate and compact predictive models, demonstrating the usefulness of the subgroups. In this paper we show that, given a significant, sufficient and diverse set of subgroups, no further learning phase is required to build a good predictive model. Our systematic study bridges the gap between rule learning and decision tree modelling by proposing a method which uses the training information associated with the subgroups to form a simple tree-based probability estimator and ranker, RankFree-MSD, without the need for an additional learning phase. Furthermore, we propose an efficient subgroup pruning algorithm, RankFree-Pruning, that prunes unimportant subgroups from the subgroup tree in order to reduce the number of subgroups and the size of the tree without decreasing predictive performance. Despite the simplicity of our approach we experimentally show that its predictive performance in general is comparable to other decision tree and rule learners over 10 multi-class UCI data sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

Combined Economic and Emission Dispatch Solution Using Exchange Market Algorithm

This paper proposes the exchange market algorithm (EMA) to solve the combined economic and emission dispatch (CEED) problems in thermal power plants. The EMA is a new, robust and efficient algorithm to exploit the global optimum point in optimization problems. Existence of two seeking operators in EMA provides a high ability in exploiting global optimum point. In order to show the capabilities ...

متن کامل

First-Order Multi-class Subgroup Discovery

Subgroup discovery is concerned with finding subsets of a population whose class distribution is significantly different from the overall distribution. Previously subgroup discovery has been predominantly investigated under the propositional logic framework. This paper investigates multi-class subgroup discovery in an inductive logic programming setting, where subgroups are defined by conjuncti...

متن کامل

The Advantages of Seed Examples in First-Order Multi-class Subgroup Discovery

Subgroup discovery is halfway between predictive and descriptive rule learning: while there is a target concept, the goal of subgroup discovery is not necessarily to achieve high accuracy in predicting the target, but rather to identify subsets of the population whose class distribution is significantly different from the overall distribution. The target concept helps us to achieve a trade-off ...

متن کامل

مقایسه قدرت پیش بینی شبکه عصبی مصنوعی با رگرسیون لجستیک چندگانه در تفکیک بیماران دیابتی رتینوپاتی از غیر رتینوپاتی

 Background: Diabetes mellitus is a high prevalent disease among the population, and if not controlled, it causes complications and irreparable damage to the eye and cause blindness. This study goal is to investigate the predictive power of multiple logistic regression model and the Artificial Neural Network Multi-layer Perceptron (MLP) in determining patients with and without diabetic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010